27 research outputs found
Universality and predictability in molecular quantitative genetics
Molecular traits, such as gene expression levels or protein binding
affinities, are increasingly accessible to quantitative measurement by modern
high-throughput techniques. Such traits measure molecular functions and, from
an evolutionary point of view, are important as targets of natural selection.
We review recent developments in evolutionary theory and experiments that are
expected to become building blocks of a quantitative genetics of molecular
traits. We focus on universal evolutionary characteristics: these are largely
independent of a trait's genetic basis, which is often at least partially
unknown. We show that universal measurements can be used to infer selection on
a quantitative trait, which determines its evolutionary mode of conservation or
adaptation. Furthermore, universality is closely linked to predictability of
trait evolution across lineages. We argue that universal trait statistics
extends over a range of cellular scales and opens new avenues of quantitative
evolutionary systems biology
The size of the immune repertoire of bacteria
Some bacteria and archaea possess an immune system, based on the CRISPR-Cas
mechanism, that confers adaptive immunity against phage. In such species,
individual bacteria maintain a "cassette" of viral DNA elements called spacers
as a memory of past infections. The typical cassette contains a few dozen
spacers. Given that bacteria can have very large genomes, and since having more
spacers should confer a better memory, it is puzzling that so little genetic
space would be devoted by bacteria to their adaptive immune system. Here, we
identify a fundamental trade-off between the size of the bacterial immune
repertoire and effectiveness of response to a given threat, and show how this
tradeoff imposes a limit on the optimal size of the CRISPR cassette.Comment: 9 pages, 5 figure
Holographic-(V)AE: an end-to-end SO(3)-Equivariant (Variational) Autoencoder in Fourier Space
Group-equivariant neural networks have emerged as a data-efficient approach
to solve classification and regression tasks, while respecting the relevant
symmetries of the data. However, little work has been done to extend this
paradigm to the unsupervised and generative domains. Here, we present
Holographic-(V)AE (H-(V)AE), a fully end-to-end SO(3)-equivariant (variational)
autoencoder in Fourier space, suitable for unsupervised learning and generation
of data distributed around a specified origin. H-(V)AE is trained to
reconstruct the spherical Fourier encoding of data, learning in the process a
latent space with a maximally informative invariant embedding alongside an
equivariant frame describing the orientation of the data. We extensively test
the performance of H-(V)AE on diverse datasets and show that its latent space
efficiently encodes the categorical features of spherical images and structural
features of protein atomic environments. Our work can further be seen as a case
study for equivariant modeling of a data distribution by reconstructing its
Fourier encoding
Deep generative selection models of T and B cell receptor repertoires with soNNia
Subclasses of lymphocytes carry different functional roles to work together
to produce an immune response and lasting immunity. Additionally to these
functional roles, T and B-cell lymphocytes rely on the diversity of their
receptor chains to recognize different pathogens. The lymphocyte subclasses
emerge from common ancestors generated with the same diversity of receptors
during selection processes. Here we leverage biophysical models of receptor
generation with machine learning models of selection to identify specific
sequence features characteristic of functional lymphocyte repertoires and
subrepertoires. Specifically using only repertoire level sequence information,
we classify CD4 and CD8 T-cells, find correlations between receptor
chains arising during selection and identify T-cells subsets that are targets
of pathogenic epitopes. We also show examples of when simple linear classifiers
do as well as more complex machine learning methods
Adaptive evolution of molecular phenotypes
Molecular phenotypes link genomic information with organismic functions,
fitness, and evolution. Quantitative traits are complex phenotypes that depend
on multiple genomic loci. In this paper, we study the adaptive evolution of a
quantitative trait under time-dependent selection, which arises from
environmental changes or through fitness interactions with other co-evolving
phenotypes. We analyze a model of trait evolution under mutations and genetic
drift in a single-peak fitness seascape. The fitness peak performs a
constrained random walk in the trait amplitude, which determines the
time-dependent trait optimum in a given population. We derive analytical
expressions for the distribution of the time-dependent trait divergence between
populations and of the trait diversity within populations. Based on this
solution, we develop a method to infer adaptive evolution of quantitative
traits. Specifically, we show that the ratio of the average trait divergence
and the diversity is a universal function of evolutionary time, which predicts
the stabilizing strength and the driving rate of the fitness seascape. From an
information-theoretic point of view, this function measures the
macro-evolutionary entropy in a population ensemble, which determines the
predictability of the evolutionary process. Our solution also quantifies two
key characteristics of adapting populations: the cumulative fitness flux, which
measures the total amount of adaptation, and the adaptive load, which is the
fitness cost due to a population's lag behind the fitness peak.Comment: Figures are not optimally displayed in Firefo
MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories
Simulation-based inference enables learning the parameters of a model even
when its likelihood cannot be computed in practice. One class of methods uses
data simulated with different parameters to infer an amortized estimator for
the likelihood-to-evidence ratio, or equivalently the posterior function. We
show that this approach can be formulated in terms of mutual information
maximization between model parameters and simulated data. We use this
equivalence to reinterpret existing approaches for amortized inference and
propose two new methods that rely on lower bounds of the mutual information. We
apply our framework to the inference of parameters of stochastic processes and
chaotic dynamical systems from sampled trajectories, using artificial neural
networks for posterior prediction. Our approach provides a unified framework
that leverages the power of mutual information estimators for inference
SOS: Online probability estimation and generation of T and B cell receptors
Recent advances in modelling VDJ recombination and subsequent selection of T
and B cell receptors provide useful tools to analyze and compare immune
repertoires across time, individuals, and tissues. A suite of tools--IGoR [1],
OLGA [2] and SONIA [3]--have been publicly released to the community that allow
for the inference of generative and selection models from high-throughput
sequencing data. However using these tools requires some scripting or
command-line skills and familiarity with complex datasets. As a result the
application of the above models has not been available to a broad audience. In
this application note we fill this gap by presenting Simple OLGA & SONIA (SOS),
a web-based interface where users with no coding skills can compute the
generation and post-selection probabilities of their sequences, as well as
generate batches of synthetic sequences. The application also functions on
mobile phones